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Abstract 

Document  images  undergo  various  degradation  processes.  Numerous  models  of  these 
degradation  processes  have  been  proposed  in  the  literature.  In  this  paper  we  propose 
a  model-based  restoration  algorithm.  The  restoration  algorithm  hrst  estimates  the  pa¬ 
rameters  of  a  degradation  model  and  then  uses  the  estimated  parameters  to  construct  a 
lookup  table  for  restoring  the  degraded  image.  The  estimated  degradation  model  is  used 
to  estimate  the  probability  of  an  ideal  binary  pattern,  given  the  noisy  observed  pattern. 
This  probability  is  estimated  by  degrading  noise-free  document  images  and  then  comput¬ 
ing  the  frequency  of  corresponding  noise-free  and  noisy  pattern  pairs.  This  conditional 
probability  is  then  used  to  construct  a  lookup  table  to  restore  the  noisy  images.  The 
impact  of  the  restoration  process  is  then  quantihed  by  computing  the  decrease  in  OCR 
word  and  character  error  rate. 

We  hnd  that  given  the  estimated  degradation  model  parameter  values,  the  restora¬ 
tion  algorithm  decreases  the  character  error  rate  by  16.1%  and  the  word  error  rate  by 
7.35%.  In  some  categories  of  degradation  (e.g.  model  parameters  that  give  rise  to  broken 
characters)  there  is  a  41.5%  reduction  in  character  error  rate  and  a  20.4%  reduction  in 
word  error  rate. 
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Abstract 

Document  images  undergo  various  degradation  processes.  Numerous  models  of  these 
degradation  processes  have  been  proposed  in  the  literature.  In  this  paper  we  propose 
a  model-based  restoration  algorithm.  The  restoration  algorithm  hrst  estimates  the  pa¬ 
rameters  of  a  degradation  model  and  then  uses  the  estimated  parameters  to  construct  a 
lookup  table  for  restoring  the  degraded  image.  The  estimated  degradation  model  is  used 
to  estimate  the  probability  of  an  ideal  binary  pattern,  given  the  noisy  observed  pattern. 
This  probability  is  estimated  by  degrading  noise-free  document  images  and  then  comput¬ 
ing  the  frequency  of  corresponding  noise-free  and  noisy  pattern  pairs.  This  conditional 
probability  is  then  used  to  construct  a  lookup  table  to  restore  the  noisy  images.  The 
impact  of  the  restoration  process  is  then  quantihed  by  computing  the  decrease  in  OCR 
word  and  character  error  rate. 

We  hud  that  given  the  estimated  degradation  model  parameter  values,  the  restora¬ 
tion  algorithm  decreases  the  character  error  rate  by  16.1%  and  the  word  error  rate  by 
7.35%.  In  some  categories  of  degradation  (e.g.  model  parameters  that  give  rise  to  broken 
characters)  there  is  a  41.5%  reduction  in  character  error  rate  and  a  20.4%  reduction  in 
word  error  rate. 
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the  National  Science  Foundation  under  Grant  IIS9987944. 


1  Introduction 


Document  images  are  usually  corrupted  by  various  types  of  noise  during  document  gen¬ 
eration  and  copying  processes.  We  wish  to  design  a  filter  to  restore  a  class  of  document 
images  that  have  similar  structural  features  and  degradation  conditions.  A  traditional 
approach  to  image  restoration  is  to  use  linear  hlters  [Jai89].  Although  linear  hlters  are 
mathematically  simple,  their  use  usually  results  in  distortion  of  many  important  image 
characteristics.  In  this  paper  we  propose  an  algorithm  to  create  a  look-up-table  that  can 
be  used  for  restoring  degraded  document  images. 

The  issue  of  morphological  hlter  design  has  been  studied  by  many  researchers. 
Dougherty  [Dou92]  proposed  a  method  of  characterizing  the  optimal  binary  morpholog¬ 
ical  hlter  in  terms  of  the  Matheron  representation.  Using  the  Matheron  representation, 
any  binary  morphological  hlter  can  be  expressed  as  a  union  of  binary  erosions.  The 
hlter  design  procedure  is  thus  essentially  the  problem  of  hnding  structuring  elements 
that  yield  statistically  optimal  representations.  To  mitigate  the  computational  burden 
of  hlter  design,  Loce  [LD92]  adds  some  constraints  like  the  number  of  erosions,  window 
size,  and  structuring  element  libraries  to  minimize  search.  As  a  result,  his  hlter  design 
is  suboptimal.  Schofeld  and  Goutsias  [SG91]  consider  the  set-difference  distance  as  a 
measure  of  comparison  between  images,  and  by  using  this  function,  they  prove  that  the 
class  of  alternating  sequential  hlters  is  a  set  of  parametric,  smoothing  morphological  hl¬ 
ters  that  best  preserves  the  crucial  structure  of  input  images  in  the  least  mean  difference 
sense.  Liang  and  Haralick  [LH96]  present  a  method  of  restoring  document  images  de¬ 
graded  by  subtractive  or  additive  noise,  given  a  constraint  on  the  size  of  the  hlters.  The 
improvement  of  their  algorithm  is  shown  by  the  increased  accuracy  of  an  OCR  system. 

One  of  the  common  limitations  of  the  above-mentioned  algorithms  lies  in  the  lack  of 
prior  statistical  information  or  an  adequate  image  noise  model,  which  makes  them  com¬ 
putationally  complex.  This  suggests  that  greater  improvement  of  restoration  algorithms 
may  be  achievable  by  using  an  image  noise  model. 

A  survey  of  document  image  degradation  models  proposed  in  the  literature  can  be 
found  in  [Bai99].  We  use  the  model  proposed  by  Kanungo  et  ah  [KHP94,  KHB“’“00]  for 
our  restoration  algorithm. 

2  Document  Degradation  Model 

Our  degradation  model  [KHP94]  has  six  parameters:  0  =  (?y,  Oo,®, /3o, /?,  A;).  We  model 
the  probability  of  a  pixel  hipping  from  foreground  to  background  or  vice  versa  as  an 
exponential  function  of  its  distance  from  the  nearest  boundary  pixel.  The  foreground 
and  background  4-neighbor  distances  are  computed  using  a  standard  distance  transform 
algorithm.  The  hipping  probabilities  of  foreground  and  background  pixels  are  controlled 
by  ctoW  s-iid  /3o,/3  respectively.  The  parameters  q;o,/3o  are  the  initial  values  for  the 
exponentials,  and  the  decay  speeds  of  the  exponentials  are  controlled  by  the  parameters 
a,  jS.  Parameter  t]  is  the  constant  probability  of  hipping  for  all  pixels.  Parameter  k  is 
the  size  of  the  disk  used  in  the  morphological  closing  operation.  This  operation  normally 
simulates  the  correlation  introduced  by  the  point-spread  function  of  the  optical  system. 
The  procedure  for  degrading  an  ideal  binary  image  is  as  follows: 
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Figure  1:  (a)  A  typical  ideal  image;  (b)  Degraded  version  of  (a)  with  parameters 

(1.0,  0.7, 1.0,  3.0);  (c)  Degraded  version  of  (a)  with  parameters  (1.0,  3.0, 1.0,  0.7). 

1.  Compute  the  distance  d  of  each  pixel  from  the  nearest  character  boundary. 

2.  Flip  each  foreground  pixel  with  probability 

p(0|l,  d,  tto,  a)  =  aoe“"‘^^  +  rj. 

3.  Flip  each  background  pixel  with  probability 

p(l|0,d, /3o,/3)  =  /3oe“^‘^"  +  ??• 

4.  Perform  a  morphological  closing  operation  with  a  disk  structuring  element  of  di¬ 
ameter  k. 


Figure  1  illustrates  ideal  and  degraded  images  with  different  model  parameters.  Note 
that  the  two  degraded  images  differ  in  the  speed  of  decay  of  the  exponential  functions. 
If  a  <  /3,  more  foreground  pixels  change  to  background  so  the  images  appear  to  be 
corrupted  by  subtractive  noise.  If  a  >  /3,  more  background  pixels  change  to  foreground 
so  the  images  appear  to  have  additive  noise. 

3  The  Estimation  Algorithm 

In  this  section,  we  briefly  describe  a  parameter  estimation  algorithm  [KZOl]  for  the 
degradation  model  described  in  the  previous  section.  The  basic  assumption  of  this  algo¬ 
rithm  is  that  two  document  images  with  similar  noise  should  have  neighborhood  pattern 
distributions  that  look  similar.  Thus  we  can  estimate  model  parameters  by  degrading 
documents  with  various  model  parameter  values  and  choose  the  one  that  gives  rise  to  a 
neighborhood  pattern  distribution  that  is  very  close  to  that  of  the  given  degraded  image. 

Let  P  be  a  set  of  neighborhood  bit  patterns  and  p  be  an  arbitrary  element  in  the  set 
P .  If  we  choose  a  3  X  3  neighborhood,  we  will  have  a  total  of  512  different  patterns.  Let 
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Hr  denote  the  pattern  distribution  of  a  degraded  image  R  so  that  Hr{p),  where  p  ^ 
is  the  number  of  times  the  pattern  p  occurs  in  the  binary  image  R.  Using  mathematical 
morphology,  we  can  dehne  Hr[p)  more  precisely: 

HR{p)  =  #{Rep}.  (1) 

We  say  that  two  images  R  and  S  are  similar  if  the  corresponding  pattern  distributions 
Hr  and  Hsg  are  similar.  To  test  the  similarity  of  two  pattern  distributions,  we  use  the 
Kolmogorov- Smirnov  test  [Mas51]  of  the  two  pattern  distributions.  Let  K S{Hr,  Hsg) 
denote  the  KS  test  p- value  for  the  null  hypothesis  that  the  two  distributions  are  the 
same.  We  will  use  this  p- value  as  the  objective  function  that  the  estimation  process  tries 
to  maximize.  That  is. 


9  =  aTgmaxKS{HR,Hsg)  (2) 

0 

Conventional  optimization  algorithms  typically  need  the  functional  form  of  the  objective 
function.  However,  in  our  case,  since  Se  is  computed  by  simulation,  it  is  impossible 
to  use  standard  derivative  approaches.  We  thus  choose  the  simplex  optimization  algo¬ 
rithm  [NM65]  to  minimize  JCU,  which  needs  only  function  values  to  maximize  or  minimize 
functions.  To  prevent  the  problems  of  local  minima,  we  select  multiple  random  starting 
locations  and  pick  the  solution  corresponding  to  the  lowest  p- value. 

4  The  Restoration  Algorithm 

In  this  section  we  demonstrate  that  by  using  our  degradation  model,  we  can  design  Liters 
in  a  more  concise  and  efficient  way,  and  the  corresponding  restoration  procedure  is  thus 
simple  and  easily  implemented. 

Compared  to  other  morphological  restoration  algorithms  [LD92,  LH96],  our  method 
is  model-based.  We  always  assume  that  the  degraded  image  can  be  characterized  by  a 
set  of  parameters  that  can  be  estimated  by  using  the  algorithm  described  in  the  previous 
section.  Our  algorithm  has  two  stages,  a  training  stage  and  a  restoration  stage. 

Suppose  we  have  an  ideal  image  /  and  a  corresponding  degraded  image  Sg  where  6  is 
the  estimated  parameter  set  used  to  generate  Sg  from  /.  The  training  stage  is  responsible 
for  computing  the  conditional  distribution  between  the  noise  pattern  pairs  in  the  image 
pair  (/,  Sg).  During  the  training  stage,  we  first  scan  Sg.  Next  we  obtain  its  noise  pattern 
Ps{x^y)  at  location  (x^y).  We  also  obtain  the  point  pattern  at  location  (x^y)  in  the 
ideal  image  I:  Pi{x,y).  From  the  pattern  pairs  {Pi{x,y),  Ps{x,y)),  we  form  the  pattern 
distribution  of  an  ideal  image  /  conditioned  on  the  degraded  image  Sf  Hg{Pi\Ps). 

The  restoration  stage  takes  place  after  estimating  the  model  parameters  of  the  de¬ 
graded  image.  Let  Q  represent  the  restored  image  version  of  Sg.  Given  the  pattern 
Ps{x^y)  at  location  (x^y)  of  the  degraded  image  Sg^  the  restored  pattern  Pq^x^y)  in  Q 
is  computed  as 

Pqix.y)  =  iiigraiiyiHg{p\Ps{x,y))  (3) 

p^Pi 

Equation  (4.1)  is  essentially  the  Maximum  Likelihood  (ML)  estimate  of  the  pattern  based 
on  the  known  parameter  6. 
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Figure  2  shows  an  ideal  image  and  its  degraded  versions  with  two  different  parameter 
sets.  Figure  3  shows  four  typical  noise  patterns  in  the  degraded  image  in  Figure  2(b)  and 
its  conditional  pattern  distribution  based  on  the  corresponding  ideal  image  in  Figure  2(a). 
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Figure  2:  (a)  A  typical  ideal  image;  (b)  Degraded  version  of  (a)  with  parameters 

(1.0,  0.7, 1.0,  3.0);  (c)  Degraded  version  of  (a)  with  parameters  (1.0,  3.0, 1.0,  0.7). 
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Figure  3:  Four  typical  noise  patterns  are  shown  in  the  leftmost  column.  The  pattern 
entries  in  the  other  columns  show  possible  ideal  patterns  and  the  corresponding  proba¬ 
bilities.  The  ideal  image  was  degraded  with  parameter  set  (1.0,  0.7, 1.0,  3.0). 


5  Experimental  Protocol  and  Results 

The  experiment  is  outlined  illustrated  in  Figure  5.  The  basic  idea  is  to  compare  the 
OCR  result  of  the  degraded  image  with  that  of  the  restored  one.  The  evaluation  soft¬ 
ware  is  provided  by  the  University  of  Maryland.  It  compares  the  OCR  outputs  and 
the  corresponding  groundtruth  information  and  generates  statistical  information  such  as 
character-level  or  word-level  accuracy  in  a  batch  mode.  We  believe  that  the  OCR  accu¬ 
racy  rate  is  a  good  and  objective  indicator  for  showing  how  well  our  algorithm  improves 
the  overall  image  quality. 
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Figure  4:  (a)  The  restored  version  of  the  image  shown  in  Figure  2(b).  (b)  The  restored 
version  of  the  image  shown  in  Figure  2(c). 


The  test  images  were  100  one-column  pages  of  English  Bible  that  were  typeset  using 
The  image  size  is  A4  with  12-point  font  size.  One  additional  image  was  typeset 
to  generate  pattern  distributions  for  the  estimation  process.  While  its  text  content  was 
different  from  that  of  the  100  test  images,  its  font  and  bigram  symbol  probabilities  had 
the  characteristics  of  the  test  images.  The  100  test  images  were  degraded  and  then 
categorized  into  ten  groups  with  each  group  possessing  a  unique  parameter  set.  The 
OCR  product  was  FineReader4.0,  manufactured  by  ABBYY.  Tables  6-15  give  the  OCR 
accuracy  before  and  after  our  restoration  algorithm.  Figures  6-15  show  typical  degraded 
images  and  restored  images  with  the  corresponding  parameter  sets.  We  also  compute  the 
image  noise  level  (absolute  mean  error)  for  the  purpose  of  comparison  with  morphological 
hlter  based  algorithms.  The  foreground  noise  level  (FNL)  is  an  indicator  that  measures 
how  many  black  (foreground)  pixels  in  the  original  image  change  to  white  (background), 
and  the  background  noise  level  (BNL)  is  used  to  detect  how  many  white  pixels  change 
to  black.  They  can  be  computed  by  doing  logical  operations  between  the  ideal  image  (/) 
and  degraded  image  (D)  or  restored  image  (R).  The  number  of  flipping  pixels  (FTP) 
basically  summarizes  both  kinds  of  noise.  Mathematically,  the  above  three  metrics  can 
be  represented  in  terms  of  set  operations: 
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Figure  5:  Illustration  of  the  experimental  setup  to  compare  OCR  accuracy  on  restored 
versus  unrestored  images. 


(4) 

(5) 

*{i\ 

EFP  = 

(6) 

*{I\ 

where  0  denotes  the  XOR  operation  and  ^  is  the  cardinality  of  the  set  (i.e.  the  number 
of  foreground  pixels  in  a  binary  image). 

From  the  test  statistics,  we  see  that  our  restoration  algorithm  decreases  both  the 
OCR  error  rate  and  image  noise  level.  For  instance,  the  decreases  in  OCR  accuracy 
error  rate  at  the  character  and  word  levels  range  from  3.4%  to  41.5%  and  from  1.0% 
to  20.4%  respectively,  depending  on  what  model  parameters  are  associated  with  the 
degraded  images.  In  particular,  we  hud  that  our  algorithm  performs  better  in  restoring 
images  suffering  from  broken  characters  (Figures  8  and  9)  than  those  that  have  blurred 
characters  (Figures  12  and  13).  This  gives  us  the  impression  that  the  OCR  product 
seems  to  be  more  vulnerable  to  broken  characters  which  have  more  subtractive  noise. 
In  addition  to  the  OCR  error  rate,  our  algorithm  signihcantly  decreases  the  image  noise 
level  by  amounts,  ranging  from  13.1%  to  52.7%. 
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Figure  6:  (a)  A  sample  degraded  image  with  parameters  (0.6,  0.8, 1.0,  3.0);  (b)  Restored 
image  of  (a). 


Table  1:  OCR  error  improvement  with  parameters  cto,  a,  /3o,  /?  =  (0.6,  0.8, 1.0,  3.0). 


OCR  Result 

Degraded  Image 

Restored  Image 

Improvement 

Num.  of  Chars 

24660 

24580 

Num.  of  Correct  Chars 

23885 

23910 

Num.  of  Char  Errors 

775 

670 

13.5% 

Num.  of  Words 

4855 

4855 

Num.  of  Correct  Words 

3762 

3816 

Num.  of  Word  Errors 

1093 

1039 

4.9% 

Foreground  Noise  Level 

16.1% 

11.8% 

Background  Noise  Level 

0.19% 

0.19% 

Num.  of  Error  Flipping  Pixels 

502659 

409992 

18.4% 
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Figure  7:  (a)  A  sample  degraded  image  with  parameters  (0.8,  0.8, 1.0,  3.0);  (b)  Restored 
image  of  (a). 


Table  2:  OCR  error  improvement  with  parameters  Oq,  o,  /3o,  /?  =  (0.8,  0.8, 1.0,  3.0). 


OCR  Result 

Degraded  Image 

Restored  Image 

Improvement 

Num.  of  Chars 

24391 

24806 

Num.  of  Correct  Chars 

23935 

23999 

Num.  of  Char  Errors 

996 

807 

19.0% 

Num.  of  Words 

4953 

4953 

Num.  of  Correct  Words 

3737 

3846 

Num.  of  Word  Errors 

1216 

1107 

9.0% 

Foreground  Noise  Level 

22.2% 

14.7% 

Background  Noise  Level 

0.18% 

0.24% 

Num.  of  Error  Flipping  Pixels 

625228 

516481 

17.4% 
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Figure  8:  (a)  A  sample  degraded  image  with  parameters  (1.0,  0.8, 1.0,  3.0);  (b)  Restored 
image  of  (a). 


Table  3:  OCR  error  improvement  with  parameters  Oq,  o,  do,  d  =  (l-O?  O-S?  l-O?  3.0). 


OCR  Result 

Degraded  Image 

Restored  Image 

Improvement 

Num.  of  Chars 

25651 

25262 

Num.  of  Correct  Chars 

23973 

24280 

Num.  of  Char  Errors 

1678 

982 

41.5% 

Num.  of  Words 

4973 

4958 

Num.  of  Correct  Words 

3397 

3703 

Num.  of  Word  Errors 

1576 

1255 

20.36% 

Foreground  Noise  Level 

28.8% 

15.7% 

Background  Noise  Level 

0.17% 

0.30% 

Num.  of  Error  Flipping  Pixels 

768872 

584919 

23.9  % 
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Figure  9:  (a)  A  sample  degraded  image  wii 
image  of  (a). 
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parameters  (1.0,  0-6, 1.0,  2.0);  (b)  Restored 


Table  4:  OCR  error  improvement  with  parameters  cto,  o,  do,  d  =  (1-0,  0-6, 1-0,  2.0). 


OCR  Result 

Degraded  Image 

Restored  Image 

Improvement 

Num.  of  Chars 

27426 

26370 

Num.  of  Correct  Chars 

22584 

23455 

Num.  of  Char  Errors 

4842 

2915 

40.0% 

Num.  of  Words 

5040 

5031 

Num.  of  Correct  Words 

2637 

3089 

Num.  of  Word  Errors 

2403 

1942 

19.2% 

Foreground  Noise  Level 

31.7% 

24.5% 

Background  Noise  Level 

0.41% 

0.43% 

Num.  of  Error  Flipping  Pixels 

1026668 

892519 

13.1% 
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Figure  10:  (a)  A  sample  degraded  image  with  parameters  (1.0,  0.8, 1.0,  2.0);  (b)  Restored 
image  of  (a). 


Table  5:  OCR  error  improvement  with  parameters  cto,  u,  /^o,  /?  =  (1-0,  0.8, 1.0,  2.0). 


OCR  Result 

Degraded  Image 

Restored  Image 

Improvement 

Num.  of  Chars 

25918 

25771 

Num.  of  Correct  Chars 

24324 

24408 

Num.  of  Char  Errors 

1594 

1363 

14.5% 

Num.  of  Words 

5037 

5038 

Num.  of  Correct  Words 

3465 

3581 

Num.  of  Word  Errors 

1572 

1457 

7.3% 

Foreground  Noise  Level 

24.3% 

21.0% 

Background  Noise  Level 

0.42% 

0.37% 

Num.  of  Error  Flipping  Pixels 

843493 

758692 

13.1% 

11 


functions,  formulae  and 
ysteim  for  performing  sj 
:veloped  for  research  and 
inical  sciences.  However, 
rectly  used  for  the  analy 
rs  as  the  operations  on  t 
.hose  involving  an  unspec 
definite  summations,  ha^ 
To  achieve  our  Koai,  so: 

(a) 


functions,  formulae  and 
ystems  for  performing 
;veloped  for  research  and 
inical  sciences.  However, 
rectly  used  for  the  analy 
as  as  the  operations  on  t 
hose  involving  an  unspec 
definite  summations,  ha\ 
To  achieve  our  goal,  so 
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Figure  11:  (a)  A  sample  degraded  image  with  parameters  (1.0,  1.0, 1.0,  2.0);  (b)  Restored 
image  of  (a). 


Table  6:  OCR  error  improvement  with  parameters  cto,  a,  /3o,  /?  =  (1.0, 1.0, 1.0,  2.0). 


OCR  Result 

Degraded  Image 

Restored  Image 

Improvement 

Num.  of  Chars 

25001 

24950 

Num.  of  Correct  Chars 

23952 

24003 

Num.  of  Char  Errors 

1049 

947 

9.7% 

Num.  of  Words 

4887 

4889 

Num.  of  Correct  Words 

3614 

3682 

Num.  of  Word  Errors 

1273 

1207 

5.2% 

Foreground  Noise  Level 

19.1% 

18.6% 

Background  Noise  Level 

0.42% 

0.29% 

Num.  of  Error  Flipping  Pixels 

750851 

629294 

16.2% 
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Figure  12:  (a)  A  sample  degraded  image  with  parameters  (1.0,  1.5, 1.0,  0.6);  (b)  Restored 
image  of  (a). 


Table  7:  OCR  error  improvement  with  parameters  Oo,  o,  /3o,  /?  =  (1-0, 1.5, 1.0,  0.6). 


OCR  Result 

Degraded  Image 

Restored  Image 

Improvement 

Num.  of  Chars 

23612 

23709 

Num.  of  Correct  Chars 

23065 

23193 

Num.  of  Char  Errors 

548 

516 

5.8% 

Num.  of  Words 

4582 

4586 

Num.  of  Correct  Words 

3639 

3659 

Num.  of  Word  Errors 

943 

927 

1.7% 

Foreground  Noise  Level 

2.4% 

17.4% 

Background  Noise  Level 

1.96% 

0.52% 

Num.  of  Error  Flipping  Pixels 

1656108 

783032 

52.7% 

13 


fiinctions,  formulae  and 
ystems  for  performing  s] 
iveioped  for  research  and 
tnicai  sciences.  However, 
rectly  used  for  the  analy 
ns  as  the  operations  on  t 
hose  involving  an  unspec 
dehnite  summations,  hai^ 
To  achieve  our  Koai,  soi 

(a) 


functions,  formulae  and 
ystems  for  performing  bj 
veioped  for  research  and 
mica!  sciences.  However, 
rectly  used  for  the  analy 
IS  as  the  operations  on  t 
hose  involving  an  unspec 
definite  summations,  hav 
To  achieve  our  eoal,  so: 

(b) 


Figure  13:  (a)  A  sample  degraded  image  with  parameters  (1.0,  1.5, 1.0,  0.8);  (b)  Restored 
image  of  (a). 


Table  8:  OCR  error  improvement  with  parameters  Oq,  o,  /3o,  /?  =  (1-0, 1.5, 1.0,  0.8). 


OCR  Result 

Degraded  Image 

Restored  Image 

Improvement 

Num.  of  Chars 

24401 

24558 

Num.  of  Correct  Chars 

23827 

24037 

Num.  of  Char  Errors 

574 

521 

9.2% 

Num.  of  Words 

4737 

4742 

Num.  of  Correct  Words 

3748 

3787 

Num.  of  Word  Errors 

989 

955 

3.4% 

Foreground  Noise  Level 

3.8% 

20.1% 

Background  Noise  Level 

1.53% 

0.4% 

Num.  of  Error  Flipping  Pixels 

1337753 

752212 

43.7% 
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Figure  14:  (a)  A  sample  degraded  image  with  parameters  (1.0,  1.5, 1.0,  1.0);  (b)  Restored 
image  of  (a). 


Table  9:  OCR  error  improvement  with  parameters  cto,  a,  /3o,  /3  =  (1.0, 1.5, 1.0, 1.0). 


OCR  Result 

Degraded  Image 

Restored  Image 

Improvement 

Num.  of  Chars 

24717 

24834 

Num.  of  Correct  Chars 

24095 

24233 

Num.  of  Char  Errors 

622 

601 

3.4% 

Num.  of  Words 

4798 

4804 

Num.  of  Correct  Words 

3757 

3774 

Num.  of  Word  Errors 

1041 

1030 

1.0% 

Foreground  Noise  Level 

5.3% 

21.0% 

Background  Noise  Level 

1.2% 

0.4% 

Num.  of  Error  Flipping  Pixels 

1098220 

700650 

36.2% 
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Figure  15:  (a)  A  sample  degraded  image  with  parameters  (1.0,  2.0, 1.0,  1.0);  (b)  Restored 
image  of  (a). 


Table  10:  OCR  error  improvement  with  parameters  Oo,  o,  /3o,  /?  =  (1-0,  2.0, 1.0, 1.0). 


OCR  Result 

Degraded  Image 

Restored  Image 

Improvement 

Num.  of  Chars 

23604 

23663 

Num.  of  Correct  Chars 

23049 

23131 

Num.  of  Char  Errors 

555 

532 

4.1% 

Num.  of  Words 

4569 

4572 

Num.  of  Correct  Words 

3614 

3636 

Num.  of  Word  Errors 

955 

936 

2.0% 

Foreground  Noise  Level 

3.0% 

18.2% 

Background  Noise  Level 

1.17% 

0.28% 

Num.  of  Error  Flipping  Pixels 

1018179 

604504 

40.6% 

6  Summary 

A  model-based  document  image  restoration  algorithm  has  been  proposed  based  on  the 
estimated  parameters  of  the  degradation  model.  We  hrst  use  the  degradation  model 
to  estimate  the  probability  of  an  ideal  binary  pattern,  given  the  noisy  observed  pattern. 
This  probability  is  estimated  by  degrading  noise-free  document  images  and  then  comput¬ 
ing  the  frequency  of  corresponding  noise-free  and  noisy  pattern  pairs.  This  conditional 
probability  is  then  used  to  construct  a  lookup  table  to  restore  the  noisy  images.  The 
impact  of  the  restoration  process  is  then  quantihed  by  computing  the  decrease  in  OCR 
word  and  character  error  rates. 
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